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1.  INTRODUCTION 


Stepwise  procedures  are  often  used  to  select  concomitant 
variables  in  regression  with  censored  data  (Krall,  Uthoff  (  Harley, 
197S;  Peduzzi,  Hardy  $  Holford,  1980;  Lee,  Harrel,  Tolley  8  Rosati, 
1983) .  An  undesirable  feature  of  stepwise  procedures  is  that  they 
lead  to  a  single  subset  of  variables  and  do  not  suggest  alternative 
good  subsets.  Another  concern  is  the  possibility  of  premature 
termination.  We  shall  see  later  that  this  is  indeed  the  case  when 
stepwise  procedures  are  applied  to  the  multiple  myeloma  data 
(Krall,  Uthoff  8  Harley,  1975).  In  comparison,  all  subsets 
regression  provides  more  information,  is  more  reliable  and  is  to 
be  preferred  provided  that  it  is  computationally  feasible.  The 
purpose  of  this  paper  is  to  show  that  within  the  framework  of  the 
proportional  hazards  model  (Cox,  1972)  all  subsets  regression  can 
be  performed  with  very  little  computational  efforts. 


The  first  selection  criterion  that  we  consider  is  based  on 
cross-validation.  We  argue  that  the  correct  way  to  cross- validate 
in  our  setting  is  to  change  the  status  of  one  observation  from 
uncensored  to  censored.  An  asymptotically  equivalent  criterion 
that  requires  less  computation  is  the  following;  if  a  indexes 
the  model,  choose  a  to  minimize 
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where  W^  denotes  the  partial  likelihood  ratio  statistic  for 

testing  the  model  o  against  the  full  model  and  P  is  the  number 

<x 

of  covariates  included  in  model  a.  The  criterion  that 

requires  least  computation  is 
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where  W  denotes  Wald  statistic.  We  show  that  criterion  A*  is 
a 

formally  equivalent  to  Mallow's  Cp.  As  a  result,  we  are  able  to 
compute  and  compare  the  values  of  A*  for  all  possible  subsets 
making  use  of  standard  statistical  packages.  We  apply  this  to 
the  multiple  myeloma  data  and  obtain  results  remarkably  different 
from  those  obtained  by  previous  workers  using  stepwise  procedures. 
New  insights  are  gained  and  the  superiority  of  all  subsets 
regression  over  stepwise  regression  is  clearly  demonstrated. 
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2.  CRITERIA  FOR  SELECTING  VARIABLES 

The  proportional  hazards  model  is  specified  by  the  hazard 
relationship 

XCt;x)  «  Xc(t)exp(x$) 

where  x  is  a  row  vector  of  P  covariates,  B  is  a  column  vector  of 
P  regression  constants  and  Xo(t)  is  an  arbitrary  and  unspecified 
baseline  hazard  function.  Let  j>l,...,n  be  an  observed 

sample  of  failure  tines  with  B ^  >  0  indicating  a  right -censored 
observation  uid  ^  indicating  a  failure.  The  associated  covariate 
vectors  are  x^,  ...,  x^.  An  estimate  of  B  is  obtained  by  maximizing 
the  partial  likelihood 
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where  < 


<  t^^^  are  the  uncensored  failure  times  with 


corresponding  covariates  ^(k)'  censoring  stattis 

B^^^,  B^^^  and  is  the  set  of  individuals  known  to  be 

alive  just  prior  to  .  By  setting  different  components  of  B 

p 

to  zero,  we  obtain  2  submodels  of  the  full  model. 

We  propose  the  following  criterion  for  model  choice:  if  a 
indexes  the  model,  choose  a  to  maximize 
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where  6,^ (a)  denotes  the  maximum  partial  likelihood  estimate  of  $ 
computed  under  model  a  when  is  changed  from  one  to  zero.  We 
call  (4)  the  cross- validatory  criterion  for  two  reasons.  Firstly, 
we  note  that  changing  from  one  to  zero  corresponds  to  removing 
the  ith  term  from  the  partial  likelihood  (3) .  This  is  similar 
mathematically  to  ordinary  cross-validation  idiere  the  effect  of 
deleting  one  observation  is  to  remove  one  term  from  the  likelihood 
function.  Secondly,  we  note  that  the  ith  term  of  (3)  is  the 
conditional  probability 


P{death  of  (i)  at  time  t^^jone  death  in  at  time 


It  would  be  unrealistic  to  access  the  choice  of  a  with 
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Since  the  information  that  (i)  died  at  t^^  is  used  to  obtain 
B(a] ,  the  ith  term  of  (6}  is  a  biased  estimate  of  (5) .  If  we 
change  4^^  from  one  to  zero,  we  still  retain  the  information 
that  (i)  is  alive  just  prior  to  t^^  but  the  fact  that  (i)  died 
at  t^^  is  no  longer  used.  This  is  very  similar  in  spirit  to 
ordinary  cross-validation. 

While  the  cross -validatory  criterion  is  of  theoretical 

p 

interest,  its  computation  is  prohibitive.  For  each  of  the  2 
models,  we  have  to  compute  §  i*  l,...,k,  each  of  which  requires 
iteration.  By  following  the  proof  of  Stone  (1977)  ,  we  can  show 


the  asymptotic  equivalence  of  model  choice  by  cross-validation 


and  criterion  A.  In  particular,  if  oi  c  02  and  oj  is  correct, 
then  the  asymptotic  sij^nificance  level  of  either  criterion  is 


P(X^  >  2v)  where  v  «  -  P^^.  Criterion  A  as  defined  in  (1) 

requires  less  computation  since  we  only  need  to  compute  one  § 
for  each  model.  In  spite  of  this,  the  task  remains  formidable. 

An  alternative  criterion  is  A*  which  is  based  on  Wald  statistic. 
Peace  S  Flora  (1978),  Lee,  Harrel,  Tolley  8  Rosati  (1983)  compare 
the  partial  likelihood  ratio  statistic  and  Wald  statistic 
and  find  them  comparable  in  accessing  the  effects  of  concomitant 
variables  in  survival  analysis.  We  also  find  good  agreement 
between  A  and  A* .  Criterion  A*  as  defined  in  (2)  is  con^utationally 
sin^jlier.  If  we  partition  8^  as  (sT,  82)  *  (ST»  ftl)  is 

obtained  under  the  full  model,  then  W^  >  ^2^22^2  where  without 
loss  of  generality  we  have  assumed  that  model  a  corresponds  to 
setting  82  »  0  and 


Cii  C12' 

.C22  C22. 


the  inverse  of  the  information  matrix,  is  the  estimated  covariance 
matrix  of  8.  Lawless  8  Singhal  (1978)  note  that  if  we  begin  with 


(7) 


then  by  operating  on  the  matrix  with  a  sequence  of  sweep  operations, 

,  _1-  p 

we  can  obtain  W^  »  B2^22^2  for  sH  2  models.  Instead  of  (7),  we 
use  the  matrix 


where  N  is  an  arbitrary  integer  greater  than  P.  This  matrix  can 
be  obtained  easily  since  8  and  C>  are  the  standard  output  of 
any  program  that  does  Cox  regression.  The  program  that  we  use  is 
the  BMDP  program  P2L.  If  we  treat  (8)  as  if  it  were  the  matrix 
of  corrected  sums  of  squares  and  cross  products  of  the  independent 
and  dependent  variables  confuted  from  a  sanple  of  size  N,  then 
criterion  A*  is  formally  equivalent  to  Mallow's  Cp.  To  see  this, 
note  that 

Cp(o)  -  +  2(P^+1)  -  N 

where  o^  »  RSS(full  model) /(N  -  P  -  1)  »  1  by  our  choice  of  (8). 

T  1 

Since  RSS(a)  >  RSS(full  model)  *  62^22^2* 

Cp(a)  -  -  (P  .  1) 

and  the  two  criteria  are  equivalent.  The  problem  can  now  be 
handled  by  standard  statistical  packages.  We  use  the  BMDP 
program  P9R  which  does  all  sxibsets  linear  regression. 


3.  AN  EXAMPLE 


Krall,  Uthoff  S  Harley  (1975)  presents  a  data  set  consisting 
of  the  survival  times  of  sixty  five  multiple  n^eloma  patients 
with  sixteen  concomitant  variables.  Seventeen  of  the  observations 
are  censored.  They  assume  an  exponential  regression  model  in 
kdiich  the  mean  survival  time  is  a  linear  function  of  the  concomitant 
variables.  A  step-up  procedure  based  on  the  likelihood  ratio 
criterion  leads  to  the  subset  {1,2,16}.  Lawless  S  Sipghal  (1978) 
adopt  a  proportional  hazards  exponential  regression  model 

X(x)  ■  Xo®*P(x8)  . 

They  report  the  best  three  subsets  of  each  size,  according  to 
the  value  of  the  likelihood  ratio  statistic  and  Wald  statistic. 

Ihe  best  subset  of  size  two  is  {1,2}.  To  conqpare  subsets  of 
different  sizes, we  use  Akaike's  criterion  and  end  up  with  {1,2} 
as  the  best  siibset.  Unfortimately,  Lawless  8  Sin^al  do  not 
consider  all  sixteen  concomitant  variables  of  the  original  data 
set.  Instead,  they  consider  only  variables  1,  2,  3,  5,  6,.  7,  9, 

16.  We  shall  see  later  that  this  is  a  very  poor  choice.  Peduzzi, 
Hardy  8  Holford  (1980)  also  assume  a  multiplicative  exponential 
regression  model  and  use  a  stepwise  procedure  to  identify  {1,2} 
as  the  best  subset  of  the  eight  variables  considered  by  Lawless 
8  Singhal.  Their  criterion  for  inclusion  of  a  variable  is  based 
on  the  score  statistic  for  testing  the  current  model  against  the 
candidate  model.  The  criterion  for  removal  of  a  variable  is 
based  on  Wald  statistic.  The  above  analyses  are  all  based 
on  exponential  regression  models,  an  analysis  based  on  Cox  model 


appears  as  an  example  in  BMDP  manual  where  stepwise  regression 
based  on  the  partial  likelihood  ratio  criterion  leads  again  to 
the  subset  {1,2}.  Our  analysis  provides  a  lot  more  information 
than  any  stepwise  procedure.  We  report  in  Table  1  the  best  three 
subsets  of  each  site  according  to  CP  »  A'  -  IS.  For  comparison 
purpose,  we  also  compute  the  values  of  LR  «  A  -  15  for  these  models. 
The  agreement  between  A  and  A*  appears  to  be  good  especially  for 
the  better  subsets.  It  can  be  observed  that  the  value  of  CP  for 
the  best  subset  of  size  P^  decreases  initially, reaches  a  bottom 
at  P^  «  8, and  then  begin  to  increase.  The  best  subset  is 
{1,2,4,6,7,8,12,13}  which  has  a  log  partial  likelihood  of  -140.20 
compared  with  -138.14  of  the  full  model.  The  relationship  among 
the  best  five  subsets  is  shown  in  Figure  1  and  summary  statistics 
for  the  best  subset  is  given  in  Table  2.  The  subset  {1,2}  selected 
by  the  majority  of  previous  workers  is  far  from  being  the  best. 

Its  CP  value  of  11.04  and  LR  value  of  10.52  are  considerably  larger 
than  the  corresponding  values  5.00  and  5.11  of  the  best  subset. 

The  associated  log  partial  likelihood  of  -148.90  is  unimpressive. 
The  partial  likelihood  ratio  statistic  for  testing  the  subset 
{1,2}  against  the  best  subset  is  17.4  which  is  significant  at  the 
0.01  level.  Neverthless,  we  observe  that  the  addition  of  an  extra 
variable  to  {1,2}  does  not  add  much.  In  fact,  the  improvement  is 
gradualuntil  we  reach  This  e:qplains  the  selection  of  {1,2} 

by  stepwise  procedures.  Lastly,  we  consider  {1,2,3,5,6,7,9,16}, 
the  set  of  variables  used  by  Lawless  8  Singhal.  Its  CP  value 
of  17.84  and  LR  value  of  17.66  are  large.  The  associated  log 
partial  likelihood  is  -146.47  compared  with  -140.20  of  the  best 
subset  which  also  has  eight  variables  and  -146.54  of  the  best 
subset  of  size  4. 
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Table  1.  Best  three  tnodels  of  each  size. 


Overall 

Log  partial 

Pa 

Variables  in  model 

Position 

CP 

LR 

likelihood 

1 

1 

13.28 

12.67 

-150.98 

2 

14.35 

15.61 

-152.45 

13 

16.71 

18.39 

-153.84 

2 

*1,2 

11.04 

10.52 

-148.90 

1,13 

12.22 

11.52 

-149.40 

1.4 

13.92 

13.98 

-150.63 

3 

1.2,16 

10.93 

10.82 

-148.05 

1.2,13 

10.96 

10.77 

-148.03 

1.2,14 

11.11 

11.02 

-148.16 

4 

1,3,12,13 

9.26 

9.80 

-146.54 

1,12,13,14 

10.15 

9.22 

-146.25 

1.2,12.13 

10.23 

10.63 

-146.96 

5 

1,2,12,13,14 

7.99 

8.63 

-144.96 

1,3,7,12,13 

8.59 

10.17 

-145.73 

1,3,4,12,13 

8.89 

9.84 

-145.56 

6 

1,3,4,7,12,13 

7.96 

9.28 

-144.28 

1,2,7,12,13,14 

8.00 

9.12 

-144.20 

1,5,6,7,12,13 

8.10 

8.95 

-144.12 

7 

1,3,4,6,7,12,13 

5 

6.45 

7.59 

-142.44 

1,3,4,7,8,12,13 

7.11 

7.67 

-142.48 

1,3.4.7.12,13,14 

8.04 

9.19 

-143.24 

8 

1,3,4,6,7.8,12,13 

1 

5.00 

5.11 

-140.20 

1,3,4,7,8,12,13,14 

7.23 

7.65 

-141.47 

1,2,3,4.6.8,12,13 

7.28 

8.18 

-141.73 

-tl, 2, 3, 5,6, 7, 9, 16 

17.84 

17.66 

-146.47 

9 

1,2.3,4,6.7,8.12,13 

2 

5.89 

5.85 

-139.57 

1,3.4,6,7,8,12,13,14 

3 

6.15 

6.15 

-139.72 

1,3,4,5,6,7,8,12,13 

6.53 

6.75 

-140.02 

10 

1,2,3,4,6,7,8,12,13,14 

4 

6.33 

6.32 

-138.80 

1,3,4,6,7,8,12,13,14,15 

7.08 

7.16 

-139.22 

1,2,3,4,6,7,8,12,13.15 

7.66 

7.64 

-139.46 

11 

1,2,3,4,6,7,8,12,13,14.15 

7.23 

7.23 

-138.26 

1,2,3.4,5,6,7,8,12,13,16 

9.17 

9.32 

-139.30 

1,3,4.5,6,7.8,10.11,12,13 

10.15 

10.33 

-139.81 

12 

9.13 

9.12 

-138.21 

13 

11.07 

11.07 

-138.18 

14 

13.02 

13.02 

-138.15 

15 

15.01 

15.01 

-138.15 

Full 

model 

17.  OQ 

17.00 

-138.14 

*  This  is  the  subset  selected  by  stepwise  procedures. 
+  This  is  Lawless  S  Singhal's  subset. 
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Fig.  1.  Relationship  among  the  best  five  subsets 
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Table  2.  Sumnary  statistics  for  the  best  subset. 


Log  partial  likelihood  «  -140.20 


Variable 

Coefficient 

Standard  error 

Coeffi/S.E. 

Exp(Coeff.) 

1 

1.9160 

.6112 

3.1346 

6.7938 

3 

-1.5439 

.5025 

-3.0726 

0.2135 

4 

.9889 

.4367 

2.2647 

2.6883 

6 

-.8171 

.3951 

-2.0681 

0.4417 

7 

1.8418 

.7701 

2.3915 

6.3081 

8 

.8047 

.4078 

1.9734 

2.2360 

12 

.1070 

.0311 

3.4384 

1.1130 

13 

1.5074 

.4147 

3.6345 

4.5149 
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20.  ABSTRACT 


This  paper  shows  that  within  the  framework  of  the  proportional  hazards 
model  all  subsets  regression  can  be  performed  with  very  little  computational 
efforts.  A  selection  criterion  based  on  Nald  statistic  Is  motivated  by  a 
cross-validation  argument  In  which  the  status  of  one  observation  Is  changed 
from  uncensored  to  censored.  This  criterion  is  seen  to  be  formally  equivalent  ^ 

to  Mallow's  Cp  and  thus  the  problem.  Is  reduced  to  one  readily  handled  by  standard 
statistical  packages.  The  procedure  Is  applied  to  the  multiple  myeloma  data  to 
give  results  remarkably  different  from  those  obtained  by  previous  workers  using 
stepwise  procedures.  New  Insights  are  gained  and  the  superiority  of  all  subsets 
regression  over  stepwise  regression  Is  clearly  demonstrated. 


